Using Perfect Sampling in Parameter Estimation of a Whole Sentence Maximum Entropy Language Model
نویسندگان
چکیده
The Maximum Entropy principle (ME) is an appropriate framework for combining information of a diverse nature from several sources into the same language model. In order to incorporate long-distance information into the ME framework in a language model, a Whole Sentence Maximum Entropy Language Model (WSME) could be used. Until now MonteCarlo Markov Chains (MCMC) sampling techniques has been used to estimate the paramenters of the WSME model. In this paper, we propose the application of another sampling technique: the Perfect Sampling (PS). The experiment has shown a reduction of 30% in the perplexity of the WSME model over the trigram model and a reduction of 2% over the WSME model trained with MCMC. 1 I n t r o d u c t i o n The language modeling problem may be defined as the problem of calculating the probability of a string, p(w) = p(wl , . . . , Wn). The probability p(w) is usually calculated via conditional probabilities. The n-gram model is one of the most widely used language models. The power of the n-gram model resides in its simple formulation and the ease of training. On the other hand, ngrams only take into account local information, and important long-distance information contained in the string wl . . . wn cannot be modeled by it. In an a t tempt to supplement the local information with long-distance information, hybrid models have been proposed such us (Belle* This work has been partially supported by the Spanish CYCIT under contract (TIC98/0423-C06). t Granted by Universidad del Cauca, Popay~n (Colombia) garda, 1998; Chelba, 1998; Benedl and Sanchez, 2000). The Maximum Entropy principle is an appropriate framework for combining information of a diverse nature from several sources into the same model: the Maximum Entropy model (ME) (Rosenfeld, 1996). The information is incorporated as features which are submitted to constraints. The conditional form of the ME model is: 1 (1) p(ulx) = z (x ) where Ai are the parameters to be learned (one for each feature), the fi are usually characteristic functions which are associated to the features and Z(x) = ~ y exp{~i~l Aifi(x,y)} is the normalization constant. The main advantages of ME are its flexibility (local and global information can be included in the model) and its simplicity. The drawbacks are that the paramenter 's estimation is computationally expensive, specially the evaluation of the normalization constant Z(x) a n d t h a t the grammatical information contained in the sentence is poorly encoded in the conditional framework. This is due to the assumption of independence in the conditional events: in the events in the state space, only a part of the information contained in the sentence influences de calculation of the probability (Ristad, 1998). 2 W h o l e S e n t e n c e M a x i m u m E n t r o p y L a n g u a g e M o d e l An alternative to combining local, long-distance and structural information contained in the sentence, within the maximum entropy framework, is the Whole Sentence Maximum Entropy model (WSME) (Rosenfeld, 1997). The
منابع مشابه
Fast parameter estimation for joint maximum entropy language models
This paper discusses efficient parameter estimation methods for joint (unconditional) maximum entropy language models such as whole-sentence models. Such models are a sound framework for formalizing arbitrary linguistic knowledge in a consistent manner. It has been shown that general-purpose gradient-based optimization methods are among the most efficient algorithms for estimating parameters of...
متن کاملMildly context sensitive grammars for estimating maximum entropy parsing models
The maximum-entropy framework provides great flexibility in specifying what features a model may take into account, making it effective for a wide range of natural language processing tasks. But because parameter estimation in this framework involves computations over the whole space of possible labelings, it is unwieldy for the parsing problem, where this space is very large. Researchers have ...
متن کاملEfficient sampling and feature selection in whole sentence maximum entropy language models
Conditional Maximum Entropy models have been successfully applied to estimating language model probabilities of the form , but are often too demanding computationally. Furthermore, the conditional framework does not lend itself to expressing global sentential phenomena. We have recently introduced a non-conditional Maximum Entropy language model which directly models the probability of an entir...
متن کاملDiscriminative maximum entropy language model for speech recognition
This paper presents a new discriminative language model based on the whole-sentence maximum entropy (ME) framework. In the proposed discriminative ME (DME) model, we exploit an integrated linguistic and acoustic model, which properly incorporates the features from n-gram model and acoustic log likelihoods of target and competing models. Through the constrained optimization of integrated model, ...
متن کاملA Whole Sentence Maximum Entropy Language Model
We introduce a new kind of language model, which models whole sentences or utterances directly using the Maximum Entropy paradigm. The new model is conceptually simpler, and more naturally suited to modeling whole-sentence phenomena, than the conditional ME models proposed to date. By avoiding the chain rule, the model treats each sentence or utterance as a \bag of features", where features are...
متن کامل